Domain Adaptation for Named Entity Recognition Using CRFs
نویسندگان
چکیده
In this paper we explain how we created a labelled corpus in English for a Named Entity Recognition (NER) task from multi-source and multi-domain data, for an industrial partner. We explain the specificities of this corpus with examples and describe some baseline experiments. We present some results of domain adaptation on this corpus using a labelled Twitter corpus (Ritter et al., 2011). We tested a semi-supervised method from (Garcia-Fernandez et al., 2014) combined with a supervised domain adaptation approach proposed in (Raymond and Fayolle, 2010) for machine learning experiments with CRFs (Conditional Random Fields). We use the same technique to improve the NER results on the Twitter corpus (Ritter et al., 2011). Our contributions thus consist in an industrial corpus creation and NER performance improvements.
منابع مشابه
Conditional Random Fields and Support Vector Machines for Disorder Named Entity Recognition in Clinical Texts
We present a comparative study between two machine learning methods, Conditional Random Fields and Support Vector Machines for clinical named entity recognition. We explore their applicability to clinical domain. Evaluation against a set of gold standard named entities shows that CRFs outperform SVMs. The best F-score with CRFs is 0.86 and for the SVMs is 0.64 as compared to a baseline of 0.60.
متن کاملNamed Entity Recognition in Albanian Based on CRFs Approach
Named Entity Recognition (NER) refers to the process of extracting named entities (people, locations, organizations, sport teams, etc.) from text documents. In this work we describe our NER approach for documents written in Albanian. We explore the use of Conditional Random Fields (CRFs) for this purpose. Adequate annotated training corpora are not yet publicly available for Albanian. We have c...
متن کاملInitial Explorations on using CRFs for Turkish Named Entity Recognition
This paper reports the highest results (95% in MUC and 92% in CoNLL metric) in the literature for Turkish named entity recognition; more spesifically for the task of detecting person, location and organization entities in general news texts. We give an in depth analysis of the previous reported results and make comparisons with them whenever possible. We use conditional random fields (CRFs) as ...
متن کاملسیستم شناسایی و طبقهبندی موجودیتهای اسمی در متون زبان فارسی بر پایه شبکه عصبی
Named Entity Recognition (NER) is a fundamental task in natural language processing and also known as a subset of information extraction. We seek to locate and classify named entities in text into predefined categories such as the names of persons, organizations, locations, expressions of times, etc. Named Entity Recognition for English texts has been researched widely for the past years, howev...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کامل